Method for real time relevancy determination of terms

ABSTRACT

A method for determining relevancy of real time received terms, the method includes the steps of: determining relevancy keywords; extracting real time terms from currently received information streams; updating current reception patterns of relevancy keywords in response to a comparison between the extracted real time terms and the relevancy keywords; and determining a relevancy of relevancy keywords in response to a comparison between current reception patterns and reference reception patterns.

RELATED APPLICATIONS

This application claims the benefit of priority under 35 U.S.C. §120 asa continuation application of U.S. patent application Ser. No.10/071,155, filed Feb. 7, 2002, herein incorporated by reference in itsentirety. This application also relates to U.S. patent application Ser.No. 09/481,206, filed Jan. 11, 2000. U.S. patent application Ser. No.09/655,185, filed Sep. 5, 2000, and U.S. patent application Ser. No.09/654,801, filed Sep. 5, 2000

FIELD OF THE INVENTION

The present invention generally relates to real time relevancy systemsand a method for calculating the relevancy value of real timeinformation.

BACKGROUND OF THE INVENTION

At the beginning of the third Millennium, a client can receive a verylarge amount of information, such as real time information, from manyinformation sources. Commonly, a client has a personal computer, acellular phone, a laptop computer or another type of computerized devicethat is coupled to information sources over various networks, includingcellular networks, cable networks, broadband networks and the like. Someof the said networks form a part of the Internet.

Various data processing schemes were offered for handling and managingthe vast amount of information. Many prior art methods and systems allowfor matching information to predefined queries.

There is a need to improve the systems and methods for processing realtime information that is provided over data and media networks.

There is a need to provide systems and methods for processing real timeinformation in response to the behavior patterns of data over thesenetworks.

There is a need to provide an adjustable real time relevancy system andmethod that reflects both predefined criteria and the content of realtime generated materials.

SUMMARY OF THE INVENTION

The invention provides a method for determining relevancy of real timereceived terms, the method includes the steps of: determining relevancykeywords; extracting real time terms from currently received informationstreams; updating current reception patterns of relevancy keywords inresponse to a comparison between the extracted real time terms and therelevancy keywords; and determining a relevancy of relevancy keywords inresponse to a comparison between current reception patterns andreference reception patterns.

The at least one relevancy keyword can be extracted from a client query,from a client alarm criteria and may include a single word, a singleterm, a combination of words and a combination of terms. The queryterms, alert terms may be extracted and provided to a relevancydetermination unit by an alert module and a search engine.

The method may also include a step of estimating flow patterns of thereceived information steams. The current reception patterns of relevancykeywords may be further responsive to the estimated flow patterns of thereceived information streams. The step of estimating flow patterns mayinclude monitoring the reception of flow keywords, or any portion of thereceived information streams. Flow keywords may be predefined words butusually include commonly used words. The step of estimating the flow mayalso be done by other methods known in the art such as monitoring thebit rate of active media sources and the duration of transmission, butthis is not necessarily so.

According to another aspect of the invention the flow estimation and therelevancy value is also responsive to the source of the information.Accordingly each extracted term may be evaluated in response to apredefined weight factor associated to the origin of the extracted term.

The information packets may comprise of content such as but not limitedto text, audio, video, multimedia, and executable code streaming media.

The method may also include compensating for time differences resultingfrom a reception of information streams from distinct geographicallocations.

The method may further include a step of compensating for timedifferences resulting from a reception of information streams relatingto events that occur at distinct geographical locations.

The current reception patterns may reflect the reception of relevancykeywords during a test period or even during at least two test periods.The at least two test periods may at least partially overlap, but thisin not necessarily so. Each test period of the at least two test periodsis characterized by a corresponding current reception pattern. Thecorresponding current reception patterns are compared to the referencereception pattern. Conveniently, each comparison out of the at least twocomparisons provides a comparison result and the determination of therelevancy value is responsive to a combination of the at least onecomparison result. It is noted that the reference reception patternreflects the reception of a relevancy keyword during a time period thatis much longer than each of the test periods, but this is notnecessarily so.

The step of determining a relevancy of relevancy keywords comprisingattaching a relevancy level to relevancy keywords. The relevancy valuesare defined by relevancy value thresholds.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure will be understood and appreciated more fullyfrom the following detailed description taken in conjunction with thedrawings in which:

FIG. 1 a is a simplified illustration of the environment in which therelevancy determination unit is operating, in accordance with apreferred embodiment of the present disclosure;

FIG. 1 b is an illustration of a relevancy determination unit 2, inaccordance with a preferred embodiment of the present disclosure;

FIG. 2 is a simplified block diagram that illustrates an alert moduleoperations in association with related modules and data structures, inaccordance with a preferred embodiment of the present disclosure;

FIG. 3 is a simplified block diagram that illustrates the structure ofthe alerts index tables, in accordance with a preferred embodiment ofthe present disclosure;

FIG. 4 is a simplified block diagram that illustrates a search engineoperations in association with related modules and data structures, inaccordance with a preferred embodiment of the present disclosure;

FIG. 5 is a simplified block diagram that illustrates the structure ofthe terms index tables, in accordance with a preferred embodiment of thepresent disclosure;

FIG. 6-8 are flow charts illustrating a method for real time alert, inaccordance with a preferred embodiment of the invention;

FIG. 9-10 are flow charts illustrating a method for real time search, inaccordance with a preferred embodiment of the invention;

FIG. 11 is a flow chart illustrating a method for determining arelevancy of a keyword, in accordance with a preferred embodiment of theinvention; and

FIG. 12 illustrates a media screen illustrating relevancy values ofrelevancy keywords, in accordance with a preferred embodiment of theinvention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

It should be noted that the particular terms and expressions employedand the particular structural and operational details disclosed in thedetailed description and accompanying drawings are for illustrativepurposes only and are not intended to in any way limit the scope of theinvention as described in the appended claims.

It should be noted that the particular terms and expressions employedand the particular structural and operational details disclosed in thedetailed description and accompanying drawings are for illustrativepurposes only and are not intended to in any way limit the scope of theinvention as described in the appended claims.

The Environment of the Relevancy Determination Unit

Referring to FIG. 1 a describing system 1 in which relevancydetermination unit 2 operates, according to a preferred embodiment ofthe invention. System 1 includes distribution means 4, analysis means 5,retrieval means 6, and a database of documents 3.

Client systems 7, 8, 9, 10, 11 and 12 provide client queries to system1. Client systems are coupled to system 1 via a network and a pluralityof interfaces, such as interfaces 13, 14 and 15. For convenience ofexplanation it is assumed that client system 7 is a personal computersystem, client system 8 is a cellular phone, client system 9 is a PDA,client system 10 is a set top box coupled to a digital television,client system 11 is adapted to receive electronic mail. Accordingly,interfaces 13-15 are adapted to provide query results in variousformats, according to various communication protocols, such as theTCP/IP protocol. For example, client system 8 can receive query resultsand alerts in WAP format. Usually, a client system receives a queryresult including of text, audio stream, and video stream. Such a queryresult often includes of a URL address, for allowing a client system toaccess desired information via a network such as the Internet.

It is assumed that a client system can provide a client query and/or canupdate an alert criterion. System 1 accordingly provides said clientsystem with a query result and/or an alert.

Conveniently, distribution means 4 including of interfaces 13-15, clientmanager 18, dispatcher 17, history manager 21, query and alert manager19 and data builder 20. Client manager 18 holds client profiles. Aclient profile can indicate which queries were provided by the clientsystem, at least one format in which either a query result and/or analert is to be sent to a client system, a client identifier ID, and alist of alert criteria. Client Manager 18 manages user profiles andprovides queries or alert criteria to alert module 3 via query and alertmanager 19. Each query/alert a criterion is associated with said clientID. Conveniently, client manager 19 holds a table for mapping alerts toclient systems.

Distribution means 4 interfaces between clients and the analysis means5. Dispatcher 17 and interfaces 13-15 are adapted to receive clientqueries and/or alert criteria from client systems 7-8, to update clientprofiles and send said client queries/alert criteria to analysis means5. Query results and/or alerts are generated by analysis means 5 anddispatched to client systems by distribution means 4.

Dispatcher 17 receives from client manager updated alert criteria and/orclient queries and provides them to query and alert manager 19.Dispatcher 17 receives alerts and query results and in association withclient manager 18 determines to which client system to send said alertand/or query result and in what format. Said alert and/or query resultsare provided to one of interfaces 13-15 and to the appropriate clientsystems. Dispatcher 17 receives query results and alerts from analysissystem 5 via query and alert manager 19. In response to a reception ofan alert or a query result, dispatcher 17 in association with clientmanager 18 determine which information to include in a query result oralert to be sent to a client system. Accordingly, a content objectrequest is sent to data builder 20.

Relevancy determination unit 2 is operable to determine the relevancy ofmultiple keywords in response to the reception of the keywords. Thekeywords can be either statically or dynamically selected. It is notedthat the term “keyword” is used to describe, a single word, a singleterm, a combination of terms and a combination of words. According tosome aspects of the invention the keywords may include (i) queriesprovided by clients, and/or (ii) queries terms, and/or (iii) alertcriteria provided by clients, and/or (iv) alert terms, but this is notnecessarily so.

Relevancy determination unit 2 can process incoming data streams fromretrieval means 6 and process and filter them to provide real time termsthat are matched against the keywords but can also rely on the filteringand processing mechanisms within search engine 26 and alert module 3.If, for example, a relevancy keyword is a client query the reception ofa keyword is detected by search engine 26. It is noted that if the flowestimation is based upon the overall reception of keywords (i.e.—thefloe estimation keywords are the relevancy keywords) then the flowestimation may be made by relevancy estimation unit 2 in conjunctionwith search engine 26 and alert module 3. It is noted that the relevancydetermination unit 2 may be coupled to various agents, to client manager19.

According to an aspect of the invention relevancy determination unit 2is also operable to receive flow estimate information from flowestimating unit 410 and time zone information from time zone unit 412.Time zone estimation unit 412 and flow estimation unit 410 may becoupled to various agents, such as agents 24, 27 and 28, but this is notnecessarily so.

Flow estimation unit 410 estimates the amount of incoming traffic or anamount of a predefined portion of the incoming traffic. The trafficestimate may reflect the amount of predefined flow estimation keywordsthat were received during a predefined time period. The flow estimationunit may have its own configurable filtering systems for extracting thepredefined flow estimation keywords, but it can also receive suchinformation from alert module 3 (when the predefined flow estimationkeywords are also defined as alert terms) or from search engine 26 (whenthe predefined flow estimation keywords are also defined as queryterms).

According to an aspect of the invention the predefined flow estimationwords are not necessarily correlated with the alert and query terms, andmay even be terms that are filtered out by the alert module 3 or thesearch engine 26. The flow estimation keywords are usually terms thatare frequently used words, such as words that are discarded by termsfilter 49 of FIG. 2. The flow estimation unit 410 may be coupled to theagents of the retrieval means 13 or to the retrieval management andprioritization component 29. The retrieval management and prioritizationunit component 29 is operable to perform load balancing and may be fedby the same inputs to determine the load as the flow estimation unit410. Flow estimation information may be utilized for compensating fordifferences in the information exchange patterns of clients duringdistinct time periods. For example, during weekends and holidays theoverall flow of data stream decreases in comparison to working days.Furthermore, less data is exchanged during the night.

Time zone unit 412 estimates the local time of an event that isdescribed at a data stream. The determination is based upon the contentof the data stream, and usually depends upon location/geographicalinformation, such the name of a city, country and the like in which anevent takes place. The geographic information can be determined from theidentity of the person, company or other entity that may be includedwithin the data stream. The determination may also be based upon thesource of information, especially when the source of information usuallyprovides information relating to a known geographical area.

Data builder 20 accesses data manager 22 and provides the dispatcher therequested information. For example, an alert can indicate thatinformation source 30 provided at least one matching information packetthat matches an alert criterion of client system 10. Dispatcher receivessaid alert and determines, in association with client manager 18 thatthe alert should contain additional information from the matchinginformation source 30, such as a multimedia stream that was broadcastedby information source 30, whereas the matching information packets weredriven from said multimedia stream.

Dispatcher sends data builder 20 a content object request to receivesaid multimedia stream. Said request usually determines the matchinginformation ID and a content type/alert or query result format. Saidmultimedia stream is stored in a certain address within data manager 22,or in an external multimedia server (not shown). Said content objectrequest to receive said address. Said address is provided to dispatcher17 and via interface 13 and network 16 to client system 10. Eventually,said multimedia stream in displayed to the client. It is noted that therelevancy level of relevancy keywords that appear within the displayedmultimedia streams are reflected in various manners, such as but notlimited to, the color of the relevancy keyword, the color of thebackground of the relevancy keywords and the like.

Conveniently, distribution means 4 maintains a list of distributoridentifications ID, distributor type and user counter for each alert.

Client manager 18 is adapted to manage client system information such asclient system profile, preferences, and alert criteria.

History manager 21 is adapted to maintain alert criteria and requests toupdate said criteria for client retrieval. History manager 21 receivesrequests to update an alert criterion from dispatcher 17 and stores saidrequests, for allowing a client system to view said requests.

Query and alert manager 19 is operable to route client queries and alertcriteria updates from dispatcher 17 and routes query results and alertsfrom analysis means 5 to dispatcher 17.

Retrieval means 6 includes a plurality of agents or receptors, such asagents 24, 27 and 28. Said agents are coupled to various informationsources, such as information sources 30-36 via networks or via media.Agents 24, 27 and 28 are adapted to receive information from variousinformation sources, such as television channel 30, radio channel 31,news provider 32, web sites 33, IRC servers 34, bulletin boards 35 andstreaming media provider 36, and provide information packets to analysismeans 5. For example, agent 24 receives television broadcasts or videostreams via cable network 37 and converts the television broadcast orvideo stream to a stream of information packets. Agent 24 can include ofa dedicated encoder, a device for extracting clause caption out of saidvideo stream or picture recognition and analysis means. Agent 27receives radio broadcasts, transmitted by radio channel 31 over awireless media, and convert said transmitted audio stream to a stream oninformation packets. Agent 28 is coupled, via a network to news provider32, web sites 33, IRC servers 34, bulletin boards 35 for retrievinginformation packets transmitted from said information sources vianetwork 38. Retrieval means 6 further including of retrieval managementand prioritization component 29 for prioritizing content sources andchannels and for balancing the load between agents/receptors.

Alert module 3 is adapted to receive alert criteria from query and alertmanager 19 and to constantly match said alert criteria against portionsof received information packets, said information packets provided byretrieval means 6. When an alert criterion is fulfilled, an alertindication is provided to query and alert manager 19. Conveniently, saidalert indication including of a query ID and an information packet ID.Dispatcher 17 receives said alert indication accesses client manager 18to determine which client system is to receive an alert, what additionalinformation to provide said client system and in what format to sent thealert to said client system. Accordingly, dispatcher sends a resultobject request to data builder 20. Data builder 20 accesses data manager22, receives the additional information, provides said information todispatcher 17, and provides an alert to a client system, via aninterface and network 16.

Data Manager 22 is adapted to store received information packets, audiostreams and video streams. Optionally, data manager 22 is furtheradapted to allow data clients to get notification on data events such asdata changes, data expiration, etc. and is further adapted to allow dataproviders to register as such.

Alert module 3 allows generating alerts in real time, in response topreviously provided alert criteria and information packets beingreceived in real time. Alert module is adapted to support variousalerts, such as Boolean alerts and best effort alerts.

Search engine 26 allows generating query results in real time. Searchengine 26 is adapted to support various searching techniques, such asBoolean search and best effort search.

Classification module 25 is adapted to dynamic classification ofinformation streams/groups of information packets. Classification module25 dynamically determines a topic of a channel; thus allowing searchesand alerts based upon a topic an information stream.

Relevancy Calculations

Some relevancy calculations are described below. It is noted that theprovision of relevancy keywords, the determination of when a relevancykeyword is received and the determination and selection of flow keywordsare explained in greater detail in accordance to FIGS. 2-8 illustratingthe structure and operation of alert module 3 and search engine 26.

Generally speaking, the reception of each relevancy keyword isconstantly monitored, and compared to previous reception patterns ofthat relevancy keyword.

The comparison results in a determination of the relevancy of eachrelevancy keyword. As previously mentioned, the reception may be alsoresponsive to the flow patterns of received data streams and to the timein which the data was received.

Conveniently, the relevancy realm is partitioned into relevancy levels.The amount of levels and the partition between the various levels mayvary. For convenience of explanation it is assumed that (a) there areeight relevancy levels; (ii) the previous reception pattern isdetermined during a period of sixty days, (iii) a current receptionpattern reflects the reception of the relevancy keyword during a testperiod of either twenty four or twelve hours, (iv) the receptionpatterns are normalized in response to a flow estimation that is basedupon the reception of flow keywords, (v) the reception patterns arereflected by an average amount of receptions during the period of sixtydays and by a standard deviation of the daily averages during each dayof the period of sixty days, (iv) the previous reception pattern isupdated once a day, (v) the test period is in a form of a “slidingwindow” that ends at the current time. It is noted that otherperiods/“windows”, and even non-consecutipe sequences of periods may betaken into account.

The following first set of equations illustrate relevancy levelsthresholds for a test period of twenty-four hours, while the second setof equations illustrate relevancy levels thresholds for a test period oftwelve hours.

The two sets of equations illustrate nine relevancy levels. It is notedthat each relevancy keyword is characterized by a keyword referencepopulation and by a 24_hour and 12_hour normalized keyword currentreception values. The keyword reference population includes samples thatreflect the reception of that relevancy keyword during a period of sixtydays in relation to an aggregate amount of reception of each flowkeywords during these sixty days.

A 24_hour normalized keyword current reception value (also denoted 24hrv) is a ratio between the amount of reception of that relevancykeyword during the last twenty-four hours and between the total amountof flow keywords received during these twenty four hours. A 12_hournormalized keyword current reception value (also denoted 12 hrv) is aratio between the amount of reception of that relevancy keyword duringthe last twelve hours and between the total amount of flow keywordsreceived during these twelve hours.

According to an aspect of the invention the relevancy level isdetermined in response to a single comparison, either between the12_hour normalized keyword current reception value (also referred to as12 hrv) and the thresholds of second set of equations or between the24_hour normalized keyword current reception value (also referred to as24 hrv) and the thresholds of first set of equations. It is noted thatthe relevancy value determination may be responsive to a combination ofboth comparisons, such as but not limited to an linear combination, anon-linear combination, an average of those values, a maximal value outof the two values.

The nine relevancy values are numbered −4, −3, −2, −1, 0, 1, 2, 3 and 4,whereas a zero relevancy level reflects a relevancy keyword that isreceived in accordance to previous reception patterns, the positiverelevancy levels reflect relevancy keywords that are received more oftenthen their previous reception patterns, and vice verse.

The term “avg” as being used in the following equations is the averageof normalized keyword reception value during the sixty day period.

The term “std” as being used in the following equations is the standarddeviation of the normalized keyword reception value during the sixty dayperiod.

For a 24 hour period the relevancy of each relevancy keyword isdetermined by:

-   -   (1.1) Relevancy level is −4 if 24 hrv is equal to or smaller        than avg−std.    -   (1.2) Relevancy value is −3 if 24 hrv is greater than avg−std        but smaller than or equal to avg−0.8×std.    -   (1.3) Relevancy level is −2 if 24 hrv is greater than        avg−0.8×std but smaller than or equal to avg−0.65×std    -   (1.4) Relevancy level is −1 if 24 hrv is greater than        avg−0.65×std but smaller than or equal to avg−0.5×std.    -   (1.5) Relevancy level is 0 if 24 hrv is greater than avg−0.5×std        but smaller than or equal to

${avg} + {\left( {0.25 + \frac{0.25}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$

-   -   (1.6) Relevancy level is 1 if 24 hrv is greater than

${avg} + {\left( {0.25 + \frac{0.25}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$

but smaller than or equal to

${avg} + {\left( {0.85 + \frac{0.5}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$

-   -   (1.7) Relevancy level is 2 if 24 hrv is greater than

${avg} + {\left( {0.85 + \frac{0.5}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$

but smaller than or equal to

${avg} + {\left( {1.5 + \frac{0.75}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$

-   -   (1.8) Relevancy level is 3 if 24 hrv is greater than

${avg} + {\left( {1.5 + \frac{0.75}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$

but smaller than or equal to

${avg} + {\left( {2.2 + \frac{1}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$

-   -   (1.9) Relevancy level is 4 if 24 hrv is greater than

${avg} + {\left( {2.2 + \frac{1}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$

For a 12 hour period the relevancy of each relevancy keyword isdetermined by:

-   -   (2.1) Relevancy level is −4 if 12 hrv is equal to or smaller        than avg−1.2×std.    -   (2.2) Relevancy value is −3 if 12 hrv is greater than        avg−1.2×std but smaller than or equal to avg−1×std.    -   (2.3) Relevancy level is −2 if 12 hrv is greater than avg−1×std        but smaller than or equal to avg−0.85×std.    -   (2.4) Relevancy level is −1 if 12 hrv is greater than        avg−0.85×std but smaller than or equal to avg−0.7×std    -   (2.5) Relevancy level is 0 if 12 hrv is greater than avg−0.7×std        but smaller than or equal to

${avg} + {\left( {0.45 + \frac{0.45}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$

-   -   (2.6) Relevancy level is 1 if 12 hrv is greater than

${avg} + {\left( {0.45 + \frac{0.45}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$

but smaller than or equal to

${avg} + {\left( {1.05 + \frac{0.7}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$

-   -   (2.7) Relevancy level is 2 if 12 hrv is greater than

${avg} + {\left( {1.05 + \frac{0.7}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$

but smaller than or equal to

${avg} + {\left( {1.7 + \frac{0.95}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$

-   -   (2.8) Relevancy level is 3 if 12 hrv is greater than

${avg} + {\left( {1.7 + \frac{0.95}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$

but smaller than or equal to

${avg} + {\left( {2.4 + \frac{1.2}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$

-   -   (2.9) Relevancy level is 4 is 12 hrv is greater than

${avg} + {\left( {2.4 + \frac{1.2}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$

Relevancy Determination Unit

FIG. 1 b illustrates various optional modules/portions of relevancydetermination unit 2.

It is noted that relevancy determination unit 2 may have its ownfiltering and processing capabilities, such as those of alert module 3or search engine 26 for allowing the extraction of terms from receiveddata streams and a comparison of the extracted terms to relevancykeywords and flow keywords. FIG. 1 b illustrates a relevancydetermination unit 2 in a scenario where alert terms and query terms arethe relevancy keywords and are provided by alert module 3 and searchengine 26.

Relevancy determination unit 2 has a plurality of interfaces, such asfirst interface 405 for receiving information from search engine 26,second interface 406 for receiving information from alert module 3,fourth interface 407 for receiving information from time zone unit 412and fifth interface 408 for receiving information from flow estimationunit 410. Relevancy determination unit 2 also has processor 400 forcalculating current reception patterns and previous reception patternsin response to the reception of information relating to the reception ofrelevancy keywords and a storage unit 404, coupled to the firstinterface and the processor, for storing current reception patterns,previous reception patterns and information relating to the reception ofrelevancy keywords. Storage unit 404 stores relevancy keyword table 402.

Whenever search module 26 detects that a query term was received itupdates the relevancy determination unit 2, whenever alert moduledetects that an alert term was received it updates relevancydetermination unit 2.

Whenever a client updates an alert criteria or provides a query theupdate (the alert terms that form the alert criteria or the query terms,accordingly) is provided to relevancy determination unit 2 that updatesits relevancy keyword database. If the relevancy keywords are also flowkeywords, the flow keyword database is also updated.

Relevancy determination unit 2 differs from search engine 26 and alertmodule 3 in that it stores information about the reception of arelevancy keyword up till sixty days from the last reception of therelevancy keyword. Accordingly, even after a query term is deleted fromthe search engine and even after an alert term is deleted from alertmodule 3, the keyword and its statistics still remain.

The relevancy keywords are stored in a relevancy keyword table 402.Relevancy keyword table 402 comprises of entries whose keys are terms.Therefore, relevancy keyword table 402 provides fast access to theentries by using terms as access keys. The said structure also providesfor fast insertion of terms into the table. Each entry of relevancykeyword table 402 stores both reference statistics and test period dataof the relevant reference keyword. For example, an entry of a relevancykeyword may store the amount of reception of the relevancy keywordduring the current test period, and also stores statistics reflectingthe reception of the relevancy term during the reference period of sixtydays. The time of reception (or modified time of reception in responseto time zone information) is stored until the test period is “moved”such as to place the reception time outside the test period.

It is noted that there may be time periods, during the reference period,during which the relevancy determination unit 2 does not receive anyindication of a reception of a relevancy keyword. This may occur when aquery term is deleted or an alert criterion is altered. In such a casethese periods are not taken into account in the reception statistics.This may be avoided if these query terms are still stored and comparedto incoming extracted terms during the sixty days period.

Assuming that the relevancy keywords are query terms and alert termsthen whenever they are updated the relevancy keyword table 402 eitheradds an entry to the table or updates the statistics of an entry.

In a periodical manner, the content of relevancy keyword table 402 isscanned and processed by processor 400 to update the relevancystatistics. The relevancy statistics are responsive to flow statistics,as being provided by flow estimation unit 410. Preferably, the flowstatistics are provided by either alert module 3 or search engine 26that filter out (and at the same time update relevancy determinationunit 2) frequently used words.

The determination of relevancy levels of relevancy keywords is followedby a step of updating clients, and especially clients that provided thequery terms. The alert terms. The update may be in a graphical form,such as to paint or otherwise emphasize query terms that are displayedon the display unit of a client. The update is provided to clients bydispatching means, as the query results and alarms are provided to theseclients.

Alert Module

Patent application titled “System and Method for Alerts”, Ser. No.09/654,801 filed at Sep. 5, 2000 and assigned to eNow Inc., isincorporated in its entirely by reference.

FIG. 2 illustrates various optional modules/portions of alert module 3,such as, but not limited to message coordinator 50, message filter 51,terms filter 49, alert criteria term filter 63, alert criteria extractor60.

Alert module 3 has information packet processor 53, storage means 59,storage means controller 57, alert module 55 and alert criteria module58.

Information packet processor 53 having: message coordinator module 50adapted to coordinate an handling of a plurality of information packets;message filter module 51 for filtering the plurality of informationpackets according to predefined rules; term extractor module 48 forperforming parsing and stemming on said plurality of informationpackets; and terms filter 49 for excluding extracted terms according topredefined rules.

Storage means 59 have terms index 56 and messages buffer 52.

Alert criteria module 58 having: alert criteria coordinator module 61 tocoordinate the processing of alert criteria; alert term extractor 60 toparse and stem incoming alert criteria in order to extract and processoperative alert terms; alert terms filter 63 for excluding specificalert terms in a predefined manner. Alert criteria further comprisingadditional information such as information defining a relationshipbetween alert terms, a client system identifier for determining whichclient provided said alert criteria, a weighing factor and a similaritythreshold. Said additional information is not preprocessed but stored instorage means. Preferably, said additional information is stored in analert criteria map.

In the preferred embodiment of the present disclosure, one informationsource may be a television channel that provided multimedia streams thatare later transformed into streams of information packet messages. Itshould be understood that in the following discussion of the presentdisclosure the general framework of television channels is used forpurposes of description not limitation. Said search engine received textthat is being either associated to the content of television channels ordriven out of a multimedia stream provided by television stations. Textcan be driven from a multimedia stream by various means such as specialencoders, voice recognition means. Many television channels provide textin a format of clause caption. Although information packets will bereferred to as messages, and information sources will be referred to aschannels in the text of this document, it will be appreciated that indifferent embodiments of the present disclosure other sources ofinformation could be used such as news channels, video channels, musicchannels, various Internet sites and the like. It will also beappreciated that in other embodiments of the present disclosure, theinformation packets processed could be in addition to text format inother diverse data formats such as streaming video, still pictures,sound, applets and the like.

The messages from the various channels are retrieved by retrieval means6 and eventually provided to alert module 3. The messages are receivedby Messages Coordinator Module 50 for processing. The messagestransferred consist of control data such as channel ID, Message ID,timestamp of the time of arrival, and information content such as aphrase, a sentence, a news item, a music item or a video item.

Messages Coordinator 50 coordinates the handling of the incomingmessages, and provides processed messages to term extractor 48 and tomessages buffer 52. Messages Buffer 52 is a data structure thattemporarily holds the incoming messages. In the preferred embodiment ofpresent disclosure Messages Buffer 52 is a cyclic buffer. Message Filter51 filters messages according to user-defined rules. For example,messages with a specific channel ID or messages containing specific textmight be blocked and discarded.

Term Extractor 49 receives the messages from Messages coordinator 48,performs message parsing, and stemming (finding the lexicographic root)of the resulting terms. Once the message is parsed and stemmed, a listof terms within said message is created. The terms extracted are sent tofurther processing accompanied with identifying data such as channel ID,message ID and the message arrival time. Terms Filter 49 passes theterms through a series of filters, which can change or discard specificterms. For example, Terms Filter 49 can discard stop-words, frequentlyused words, one-character words, user-defined words, system-definedwords such as “a”, “about”, “else”, “this”, and the like. According toan aspect of the invention frequently used words may be used for flowestimation. In such a case whenever such a word is received (anddiscarded) a flow indication is updated. The update may be done byrelevancy determination unit 2 or by flow estimation unit 410.

Storage means controller 57 receiving the at least one extracted term,accesses alert terms index 56, determines whether an extracted term outof said at least one extracted term matches an alert term stored inalert terms index 56 and accordingly updates the matching terminformation stored within alert term index 56. Extracted terms that donot match any alert term are discarded. Storage means controller 57 alsoschedules and initiates periodically a process that removes informationregarding matches between an alert criterion and between irrelevant ortime-decayed terms from alert terms Index 56. Description of the processwill be set forth hereunder.

Alert terms Index 56 consists of indexed alert terms and messageidentifiers that point to information relating to a reception extractedterms that match an alert term during a predetermined period of time.Alert terms Index 56 is designed to enable fast term indexing anddeletion. The indexing is done per matching term, while deletion is doneper message. When the message is discarded for becoming irrelevant ortime-decayed, information regarding a reception of matching terms beingextracted from said information packet is deleted from alert terms Index56. Alert terms Index 56 is a means to realize alerts regarding realtime content.

According to one preferred embodiment of the invention, at least aportion of a request to create or update an alert criteria pass throughalert criteria coordinator 61, alert criteria terms extractor 60 alertterms filter 63 and undergo preprocessing steps that are analogues topreprocessing steps of a massage. An alert criterion can contain severalalert terms, and associated information such as a weighing factor, or asimilarity threshold. Said associated information does not undergo saidpreprocessing steps.

Alert module 55, coupled to storage means 59, for processing at least aportion of the matching extracted term information to determine whetherto issue an alert; and for issuing at least one alert to at least oneclient system, according to said determination. Conveniently, when amatching extracted term that matches an alert term is received, alertmodule 55 checks in which alert criteria said alert term is found, andprocesses matching extracted term information associated to said alertcriteria to determine which alert criteria are fulfilled, and to whichclient systems to issue an alert.

According to an aspect of invention alert module 3 provides indicationsof a reception of alert terms and matches between alert criteria andreceived data to relevancy determination unit 2. Relevancy determinationunit 2 determines whether the received alert term or alert criteria arerelevancy keywords, and if so—updates the updates the relevancy keywordstatistics accordingly. It is noted that the determination of whether tosend such information may be processed by an additional unit withinalert module 3.

The operation of the alert module 3 will be described next. Informationpackets are extracted out of an incoming information stream. Themessages are structured, times-stamped and transferred to the operativemodules of the alert module 3. The structured messages contain controldata such as channel ID, message ID, time stamp indicative of the timeof arrival and content information such as textual data. The messagestransferred through Message Filter 51 which blocks specific messagesaccording to predefined rules. For example, messages originating inparticular channels or having specific text content or having particularcharacteristics could be discarded. The filtered messages are insertedinto Messages Buffer 52 which is managed and synchronized by MessagesCoordinator 50. Messages coordinator 50 operates in conjunction withMessages Buffer 52, which is designed to hold the messages to beretrieved for later processing. Messages Buffer 52 is a cyclic buffer.Incoming messages are inserted at one end of the Messages buffer 52while retrieved from the other end. The messages are kept in the bufferfor a predefined period of time. Time-decayed messages may be discarded.In other embodiments of the disclosure, other methods could be used todelete messages from Messages Buffer 52 such as deletion by predefinedpriorities. For example, messages from a specific low-priority channelcould be discarded first. When a message is deleted from message buffer52 information relating to the reception of extracted terms that wereextracted from said messages are deleted from term index. Messagecoordinator 50 provides messages to Term Extractor 48. Term Extractor 48performs message parsing, stemming (finding the lexicographic root) ofthe resulting tokens and extracts the tokens from the messages. Thetokens are transferred through a series of Terms Filters 49. TermsFilters 49 can change or discard a token according to predefinedparameters. For example, Terms Filters 49 can discard stop-words,one-lefter words, frequently used words, user-predefined words and thelike. Term Extractor 48 further attaches identifiers to the tokens suchas channel ID, message ID and time of arrival. Finally, Term Extractor48 dispatches the terms to storage means controller 57. Storage meanscontroller 57 receives at least one extracted term and accesses alertterms hash 56 to determine whether an extracted term matches a termalert previously stored within alert terms index 56. If the answer isyes storage means controller 57 updated matching extracted terminformation, representative of a reception of a matching extracted term.

Conveniently a reception of a matching extracted term initiates aprocess of checking at least a portion of the matching extractedinformation to determine whether an alert criteria was fulfilled.

Alert terms Index 56 is a data structure containing entries indexed byextracted terms and matching extracted term information.

A more detailed description of the operations related to inserting termsand removing terms from alert terms index 56 would be set forthhereunder in association with the related drawing.

Clients via dispatcher means initiate alert criterion and a request toupdate alert criterion. Conveniently, the handling of an request toupdate or create an alert criteria by alert criteria module 58 isanalogues to the handling of an incoming message, but portions of saidrequest dare not preprocessed in the same manner. Alert criteria arefiltered by alert criteria filter 64, and handled by alert criteriacoordinator 61. Alert criteria coordinator 61 functions in respect tothe incoming alert criteria in a like manner to Messages Coordinator 50functions in respect to the incoming messages. Alert criteriacoordinator 61 receives the queries and transfers them to the alert termextractor 60. Alert term extractor 60 parses the alert criteria andstems the resulting tokens. The tokens are filtered by a series of alertcriteria filters 63, structured into alert-terms by the attachment ofcontrol information such as alert criteria Id and time-stamp.

Scoring, or ranking of channels to be returned as a result, is doneusing a model that computes the similarity between an alert criterionand a group of information packets provided by a single informationsource. Some of the parameters involved in computing the results are:Total amounts of terms in channel in the predefined time interval,number of matching terms in the channel in the predefined time interval,total number of channels searched in the predefined time interval,elapsed time since the last appearance of the matching term in thechannel in the predefined time interval and matching terms position inthe channel. Additional factors for the score: terms in proximity tomatching term, part of speech of matching terms, relevant term frequencyand importance in the language of the channel.

The parameters further enable alert management module 55 to rank theresulting channels, and to generate a similarity rank, to be furthercompared to an alert similarity threshold, in addition to standardranking methods by the time parameter as well by giving more weight tophrases than to the collection of single words.

Referring now to FIG. 3 that illustrates the structure of the alertterms index 56 tables. The alert terms Index consists of two main units:the alert terms hash 71 and the messages hash 80. Additionally alertterms Index contains the Channel Map unit 94.

Alert terms hash 71 comprises the alert term table 72 and the associatedextracted matching terms Inverted File 73. The alert term Hash 71comprises of entries whose keys are terms. Therefore, alert term Hash 71provides fast access to the entries by using terms as access keys. Thesaid structure also provides for fast insertion of terms into the table.Alert term table 72 stores a plurality of alert terms, provided byclient systems. Extracted matching terms inverted file 73 storesmatching extracted terms information, representative of a reception ofextracted terms that match alert terms during a predetermined period oftime. Said extracted terms are also referred to an extracted matchingterms.

The matching extracted terms inverted file 73 comprises of a sorted listof matching extracted terms inverted entries map 78 and at least one ofthe following files: (a) a total number of references (Total Instances)77 to the matching extracted term in all the messages currently storedin Messages Buffer 52 of FIG. 2, (b) the modification time of theextracted matching term (Last Modification Time) 74, or (c) a number ofchannels that contain the extracted matching term 76. Each entry, suchas entry 786 in extracted matching terms inverted entries map 78 iskeyed by the channel ID 87 and has the number of references (InstancesNo) 88 to the extracted matching term in that channel and the time ofthe last appearance of the extracted matching term in the channel (Timeof Last Appearance) 89. The number of references that are added to theTotal Instances 77 could be used to determine the channel's relevance toa specific alert criterion.

Messages Hash 80 indexed by Message ID 81 in order to provide fastdeletion of term's references by message. Messages Hash 80 comprisesMessage ID table 81 and the associated Message Data table 90. Each entryin Message Data table 90 contains information about one message andpointed to by a Message Hash entry 81. Message Data table 90 consists of(a) the channel ID 93 (b) message time 92, and (c) Message Terms KeyedMap 91. The Message Terms Keyed Map 91 is a sorted list of MessageCharacteristics Entries 82. A pointer 83 keys each entry, which isunique to each term. Therefore, a Message Characteristics Entry 82 canbe found easily by a specific term. Message Characteristics Entry 82contains the following information: (a) the number of times the relatedextracted matching term was referred to in the relevant message(Instances No) 84, and (b) a pointer to the related Inverted File Entry85.

The Channel Map 94 is a list sorted by channel IDs 95. For each channelID 95, Channel Map 94 holds the total number of currently indexedextracted matching terms that belong to the channel 96. In the preferredembodiment of the present disclosure, said total number relates to thenumber of extracted matching terms after filtering. In a differentembodiment of the present disclosure, the total number could relate tothe number of extracted matching terms before filtering or to theaverage of both values.

The alert criteria map 100 is a list sorted by a criterion IDs 98. Foreach alert ID 98, alert criteria map 100 holds an alert criterion. Analert criteria can hold more than a single alert term, a weighing factorgiven to each alert term of the alert criteria, a similarity factor andthe alert term ID of each of the alert terms of said alert criteria, forallowing to process matching extracted term information representativeof a reception of terms of the alert criteria. Alert criteria map 100 isbuilt and updated according to requests issued by client systems.

The operations supported by the alert terms index 56 of FIG. 2 will bedescribed next. Alert terms index 56 of FIG. 2 supports three modes ofoperation: (1) an update, a deletion or creation of an alert criteria,(2) extracted matching information deletion by message ID, and (3)extracted matching term information deletion by the garbage collectionprocess.

An alert criteria is updated, deleted or created by storage meanscontroller 57, in response to a reception of a request from a clientsystem. The whole update criteria is given an alert criteria ID, said IDand the alert criteria are stored in alert criteria map 100. Each alertterm of the alert criteria is indexed and inserted to alert terms index56.

Storage means controller 57 handles an update of matching extractedinformation when an extracted term that matches an alert term isreceived. Accordingly, the following sequence of steps is performed:

The alert Term 72 to extracted matching Terms Inverted File 73 link isaccessed or created. A pointer to extracted matching Terms Inverted File(invertedFilePtr) is saved.

The Total Instances 77 member's value in extracted matching TermsInverted File 73 pointed at by invertedFilePtr is increased by one.

The Last Modification Time 74 member in extracted matching TermsInverted File 73 pointed at by invertedFilePtr is updated.

The entry for channel Id 87 in extracted matching Terms Inverted EntriesMap 79 is accessed or created. A pointer to the entry is saved asinverted FileEntryPtr.

The value of Instances No 88 member in the entry pointed at byinvertedFileEntryPtr is increased by one.

The appropriate Message Data is accessed or created in Message Hash 80.A pointer to the entry is saved as messageData.

The Message Characteristic Entry 82 in Message Data 90/Message TermsKeyed Map 91 is accessed by invertedFilePtr or created. A pointer to theentry is saved as messagecharac.

In the entry pointed at by messagecharac the value of Instances Number84 member is increased by one.

In the entry pointed at by messagecharac, the invertedFileEntry pointeris set to point at invertedFileEntryPtr.

In the Message Data 90, the Message Time 92 member is updated.

In the Message Data 90 the channel ID 93 member is updated.

A deletion of extracted matching term information representative of areception of matching extracted terms extracted from a message occurswhen a message is deleted. A message can be deleted when the MessagesBuffer 52 of FIG. 2 is full or a predetermined time interval indicativeof the period a message should be kept in the buffer 52 has beencompleted.

For extracted matching term information deletion by Message Id thefollowing sequence of steps is performed:

The appropriate Message Terms Keyed Map 91 is obtained from MessagesHash 80.

For each Message Characteristics Entry 82 that points to extractedmatching Terms Inverted File 73:

-   -   The pointed extracted matching Terms Inverted File 73 is        accessed and Total Instances 77 member's value is decreased by        the Instances No 84 member's value in Message Characteristic        Entry 82.    -   The Term Inverted Entry 86 is accessed and the Instance Number        88 value is decreased by Message Characteristic Entry's local        Instances No member 84 value.    -   Message Characteristic Entry 82 is deleted.    -   Steps ‘c’ through ‘e’ are repeated until Message Terms Keyed Map        91 is empty.

The Message Id 81/Message Terms Keyed Map 91 link is deleted.

Deleting an extracted matching term information not via Message Id 81 isdone periodically by the garbage collecting process. The deletion isperformed if the extracted matching term's last modification timeoccurred before a specific point in time in the past which implies thatthere are currently no messages that the specific extracted matchingterm refers to or that the extracted matching term's Total Instances 77member's value equals zero. When an extracted matching term is foundthat satisfies the above conditions a simple deletion of the alert Term72 to extracted matching Terms Inverted File 73 link is performed.

According to another preferred embodiment of the invention, a singledata structure can support both real time searches and alerts. TermsIndex Table will store alert criteria and received terms. An alertcriterion will not be deleted from terms index unless a client systemrequested such a deletion. Each entry of the table will have anadditional field, for identifying the indexed term as at least a portionof an alert criterion or as a received extracted term. According to saidembodiment, when storage means controller 57 receives an extracted termis determines whether said extracted term matchers matches an alertterm, and if the answer is ‘no’ said term is indexed in alert terms hash56, with an indication that it is not an alert term. Said extracted termcan be deleted from alert terms module 56 by message ID or by a garbagecollecting process.

Referring to FIGS. 6-8 illustrating method 101 for real time alerts,method 101 comprising of the following steps:

Step 110 of receiving an information packet; said information packetseither provided by an information source or representative of a portionof a received signal provided by an information source.

Step 110 is followed by step 120 of extracting at least one extractedterm out of the information packet.

Step 120 is followed by step 150 of determining whether an extractedterm out of said at least one extracted term matches an alert term, andaccordingly either discarding said extracted term (step 154) or updating(step 151) a matching term information representative of a reception ofmatching extracted terms, an alert criteria comprising of at least onealert term, said matching term information being stored in a storagemeans that is configured to allow fast insertion and fast deletion ofcontent. The matching term information is also provided to relevancydetermination unit 2.

Conveniently, step 150 is preceded by step 130 of receiving alertcriteria from client systems and processing said criteria to update orcreate an entry in alert term table 72 and alert criteria map 101.Conveniently step 154 is followed by step 110.

Steps 160 and 155 follow step 151. Step 160 of processing at least aportion of the matching extracted term information to determine whetherto issue an alert. Conveniently, said processing step can implementcomplex matching techniques, Boolean matching techniques, probabilisticmatching techniques; fuzzy matching techniques; proximity matchingtechniques; and vector based matching techniques. Said process can bebased upon an analysis of the matching extracted term informationrepresentative of a reception of matching extracted terms from a singleinformation source, said information source being identified by achannel ID. Conveniently, a portion of the matching extracted terminformation that is processed, said portion is determined by an alertcriteria. Preferably, said alert criteria comprising of the at leastmatching extracted term received in step 110. If, for example, amatching extracted term is a part of an alert criteria, said alertcriteria further comprising an additional alert term, a portion ofmatching extracted term information representative of both alert termsis processed in order to determine whether to issue an alert.

Step 160 is followed by step 170 of issuing at least one alert to atleast one client system, according to said determination. Step 170further comprises sending relevancy determination unit 2 the alert.

Step 155 of determining to delete a message and accordingly to deletematching extracted term information representative of a reception ofmatching extracted terms extracted from said information packet.

Conveniently, steps 110 and 120 further comprise additionalpreprocessing step, such as: step 111 of processing the plurality ofinformation packets by adding control data to said information packets.The control data comprising of information packet identification,information source identification and time of arrival. Step 112 offiltering the plurality of information packets. Step 113 of parsing andstemming the plurality of information packets. Step 124 of processingsaid extracted terms by adding control information to said extractedterms. Step 125 of filtering the extracted terms to generate filteredextracted terms. Preferably, step 125 further comprising at least one ofthe following steps: step 1251 of discarding said terms constructed ofone-letter words; step 1252 of discarding said terms constructed offrequently used words; step 1253 of discarding said terms constructed ofstop-words and step 1254 of discarding said terms constructed ofpredefined words.

Step 151 of updating a matching extracted term information convenientlyinvolves the steps of storing the information packet and related controldata in the storage means; and linking between the stored informationpacket and the matching extracted term information. Preferably, step 151comprising the following steps: step 1512 of increasing a value of totalinstances in a matching extracted terms inverted file associated to saidmatching extracted term; step 1513 of updating a value of lastmodification time in said accordingly updating a matching extractedterms inverted file; step 1514 of inserting an information sourceidentification, said information source provided the extracted term, toa matching extracted terms inverted entry map table in said termsinverted file; step 1515 of increasing a value of instances number insaid matching extracted inverted entry map table associated with saidinformation source identification in said matching extracted termsinverted file; step 1516 of inserting information packet data in amessages hash table; step 1517 of inserting the matching extracted termfrom said information packet to a messages data table; step 1518 ofincreasing a value of instances in said messages data table by one; step1519 of updating a value of message time in said messages data table;and step 1510 of updating a value of information source identificationin said message data table.

Step 151 is followed by step 153 of deleting from the matching extractedterms index data structure the matching extracted term informationrepresentative of reception of matching extracted term extracted from aninformation. Said deletion occurs either after a message from which saidterm was expired is stored in the message buffer for a predeterminedperiod of time. Said matching extracted term information can also bedeleted as a result of a garbage collection process, said process isbased upon a deletion of matching extracted terms that are not mentionedduring a certain period.

Preferably, step 153 comprising the steps of: step 1531 of receiving aninformation packet identification, whereas the matching extracted terminformation representative of reception of matching extracted termsextracted from the information packets are to be deleted; step 1532 ofreading the information packet identification from the messages hashtable in said alert terms index data structure; step 1532 of obtainingrelevant entries of said extracted terms belonging to said informationpacket in said messages data; step 1533 of accessing said matchingextracted terms inverted file for each said terms entry pointed to saidmatching extracted terms inverted file; and step 1534 of decreasing avalue of said total instances by a value of said instances number foreach said terms entry pointed to said matching extracted terms invertedfile. Step 153 further comprises of step 1535 of deleting a matchingextracted term information by a garbage collection process.

Conveniently, step 130 comprising step 131 of receiving a request toupdate or create an alert criterion and processing the request by addingcontrol data. Step 130 is followed by step 132 of filtering the request.Said filtering involves excluding said requests generated frompredefined client systems. Step 130 is also followed by step 133 ofparsing and stemming the alert criteria to generate alert terms andadditional terms. Additional terms can define a relationship betweenalert terms, a weight factor associated to the alert terms, a similaritythreshold and to indicate which client system are to receive an alertwhen said criteria is matched. Step 134 is followed by step 135 ofprocessing the alert terms and additional information by adding relevantcontrol information. Step 135 is followed by step 136 of filtering saidalert terms and additional terms. Step 136 further comprising of atleast one of the following steps: step 1361 of discarding said alertterms constructed of one-letter words; step 1362 of discarding saidalert terms constructed of frequently used words; step 1363 ofdiscarding said alert terms constructed of stop-words; and step 1364 ofdiscarding said alert terms constructed of predefined words. Step 136 isfollowed by step 137 of storing said alert terms in a alert term indexdata structure for a period that is shorter than a predefined period oftime or until an alert criteria removal request is received from a user.

Conveniently step 160 comprising step 161 of fetching each alertcriterion that have an alert term that matches a matching extracted termthat was received at step 110. Step 162 of checking each alert criterionto determine which portion of matching terms extracted information tofetch. Step 163 of fetching said portion and step 164 of processing saidportion, in light of the alert criteria to determine whether to issue analert.

Conveniently, step 164 is based upon at least one of the followingparameters: (i) a total amount of extracted terms provided by aninformation source in a predefined time interval; (ii) an elapsed timesince the extracted term was provided by the information source in saidpredefined time interval; and (iii) an extracted term position in theinformation source.

Conveniently, step 164 involves computing a similarity between an alertcriteria and information indicating a reception of a group ofinformation packets. The similarity reflects at least one of thefollowing parameters: a total amounts of extracted terms being receivedfrom at least one information source during a predefined time interval;a number of matching extracted terms being received from at least oneinformation source during the predefined time interval; a total numberof information sources being searched during the predefined timeinterval; an elapsed time since a last appearance of a matchingextracted term from an information source during the predefined timeinterval; a position of matching extracted terms in at least oneinformation source; an extracted term in proximity to a matchingextracted term; a part of speech of a matching extracted term; and amatching extracted term frequency and importance in a language of theinformation source. Said similarity can be compared to a predefinedsimilarity threshold, in order to determine whether to send an alert toa client system. Preferably, the group of at least one informationpacket comprising of at least one information packet received from asingle information source.

Step 170 comprising step 171 of determining to which client system tosend an alert. Conveniently step 171 is followed by step 172 ofdetermining a format of an alert to be sent to a client system,according to a predefined client system format and formatting the alertaccording to said client system format. Preferably, the predeterminedclient format selected from a group consisting of: HTML format; WAPformat; PDA compatible format; Digital television compatible format;electronic mail format and multimedia stream format.

Preferably, an alert comprising of at least one field selected from agroup consisting of: an information source identifier field, foridentifying an information source that either provided a matchingextracted packer or for identifying an information source that provideda received signal, whereas a portion of said received signal beingrepresented in an information packet from which the extracted term wasextracted; a link field, for allowing the client system to be linked tothe information source or for allowing the client system to receiveadditional information from said information source; and an informationsource category identification, identifying a category of informationsource that provided the matching extracted term. Said additionalinformation is selected from a group consisting of: a multimedia streamoriginated by said information source; a stream of information packetsoriginated by said information source; a multimedia stream associated tothe information packet from which the extracted term was extracted; astream of information packets, comprising the extracted term.

Conveniently, a client system is configured to generate a uniqueinformation source category indication in response to a reception ofsaid information source category identification and to generate a uniqueinformation source indication in response to a reception of saidinformation source identification.

Search Module 26

The operation of search engine 26 is described at U.S. patentapplication titled “System and Method for Real Time Searching”, Ser. No.09/655,185, filed at Sep. 5, 2000 and assigned to eNow Inc., isincorporated in its entirely by reference.

Referring now to FIG. 4 where the various software modules and datastructures necessary for the operation of the Search Engine are shown.For clarity of the disclosure FIG. 4 does not illustrate some portionsof the distribution means 4, retrieval means 6 and analysis means 5 ofFIG. 1.

FIG. 4 illustrates various optional modules/portions of search engine26, such as, but not limited to, query index 258, real time queryindexing module 277, archive search module 253, semi-static databasesearch module 254, query coordinator 261 query filter 264, messagecoordinator 250, message filter 251, terms filters 249 and 263. Searchengine 26 has: Message Coordinator module 250, Message Filter module251, Messages Buffer 252, Term Extractor modules 248 and 260 TermsFilter modules 249 and 263, Real Time Indexing module 257, Real TimeQuery Indexing module 277, Terms Index 256, future search module 259 forallowing a generation of alerts to a client system, queries Index 258,query and results manager 255 user communication modules 266, 268, and270, queries coordinator 261, query filter module 264, archive searchmodule 253, and semi-static database search module 254. Although no partof the Search Engine, for the clarity of the disclosure only, Users 265,267, and 269 are shown connected to User Communication modules 266, 268,and 270. Query and results manager 255 matches query results to termsindex 256 to generate query results. Query and results manager 255matches alert criteria provided by future search module 259 to thecontent of terms index 256. Future search module also referred to asalert module 259. Although information packets will be referred to asmessages, and information sources will be referred to as channels in thetext of this document, it will be appreciated that in differentembodiments of the present disclosure other sources of information couldbe used such as news channels, video channels, music channels, variousInternet sites and the like. It will also be appreciated that in otherembodiments of the present disclosure, the information packets processedcould be in addition to text format in other diverse data formats suchas streaming video, still pictures, sound, applets and the like.

The messages are received by Messages Coordinator Module 250 and areprocessed accordingly. The messages transferred consist of control datasuch as channel ID, Message ID, timestamp of the time of arrival, andinformation content such as a phrase, a sentence, a news item, a musicitem or a video item.

Messages Coordinator 250 coordinates the handling of the incomingmessages, and provides processed messages to term extractor 248 and tomessages buffer 252. Messages Buffer 252 is a data structure thattemporarily holds the incoming messages. In the preferred embodiment ofpresent disclosure Messages Buffer 252 is a cyclic buffer. MessageFilter 251 filters messages according to user-defined rules. Forexample, messages with a specific channel ID or messages containingspecific text might be blocked and discarded.

Term Extractor 248 receives the messages from Messages coordinator 250,performs message parsing, and stemming (finding the lexicographic root)of the resulting terms. Once the message is parsed and stemmed, a listof terms within said message is created. The terms extracted are sent tofurther processing accompanied with identifying data such as channel ID,message ID and the message arrival time. Terms Filter 249 passes theterms through a series of filters, which can change or discard specificterms. For example, Terms Filter 249 can discard stop-words, frequentlyused words, one-character words, user-defined words, system-definedwords such as “a”, “about”, “else”, “this”, and the like. According toan aspect of the invention the frequently user words are utilized fordetermining the flow characteristics of incoming data.

Real Time Indexing Module 257 accepts and stores the terms into TermsIndex 256. Real Time Indexing module 257 also schedules and initiatesperiodically a process that removes irrelevant or time-decayed termsfrom Terms Index 256. Description of the process will be set forthhereunder.

Terms Index 256 consists of indexed terms and message identifiers thatpoint to information relating to a reception of said messages andindexed terms during a predetermined period of time. Terms Index 256 isdesigned to enable fast term indexing and deletion. The indexing is doneper term, while deletion is done per message. When the message isdiscarded for becoming irrelevant or time-decayed, all terms that referto this message are deleted from Terms Index 256. Terms Index 256 is ameans to realize real time search of real time content that is one ofthe search capabilities of the Search Engine module.

Alert module 259 functions in conjunction with Queries Index 258. Unlikereal time Indexing module 257, alert module 259 matches incoming termsfrom the message stream against a database of more or less staticqueries. Therefore, alert module 259 has the ability to search for aterm that is relevant to a query that was initiated at some point intime in the past as long as the relevant query is kept in the QueriesIndex 258. Alert module 259 enables the return of query results during apredefined time frame that begins at the query's arrival time.

Queries Index 258 holds queries for a predefined time frame in order toprovide the means to alert module 259 to match terms of queries againstthe terms of the incoming messages. Queries Index 258 enables to returnfuture results to queries.

According to one preferred embodiment of the invention, queries areinserted into queries Index 258 by queries coordinator 61. According toanother preferred embodiment of the invention said queries also passquery terms extractor 260 and real time query indexing module 260, andundergo preprocessing steps that are analogues to preprocessing steps ofa massage. Queries can contain several terms. Therefore, the relevantcontrol information associated with each query such as query ID,timestamp and the like is indexed against all the terms of the query.

Query and Results Manager module 255 handles the queries and providesreturn of results to the queries by establishing a unified result fromall the result sources except from Future search module 259. Resultsources are the following: (a) search in Real Time Indexing module 257,(b) search in the Semi-static database by semi-static database searchmodule 254, and (c) search in the Archive database by archive searchmodule 253.

Query and Results Manager module 255 is also operable to send theresults of at least the search in real time indexing module 257 torelevancy determination unit 2.

The results from future search module 259 are passed through the Queryand Results Manager 255 that sends the results on to the users 265, 267,and 269 via User communication modules 266, 268, and 270. Typically, aresult consists of a sorted list of channel IDs and a score for eachchannel that mirrors a channel/query match. Dispatcher means areoperable to transfer queries initiated by the users to the Search Enginemodule and return results back to the users.

When a complex search is performed, query and search manager 255analyses information regarding a various receptions of informationpacket said information packets originating from a single informationsource.

Queries Coordinator 261 functioning similarly to Messages Coordinator250 only with queries instead of messages. Queries Coordinator 261receives queries from user communication modules 266, 268, and 270 andinserts the queries into the Queries Buffer 262. Upon a request fromQuery and Results Manager 255 Queries Coordinator 261 fetches one queryfrom queries buffer 262 and passes it via Terms Filter 263 to TermExtractor 260. The extracted terms of the query are inserted by realtime query indexing module 277 into Queries Index 258.

According to one preferred embodiment of the invention, queries Buffer262 holds the queries in the same manner as the messages are held in theMessages Buffer 252. Queries Buffer 262 is a data structure thattemporarily holds the incoming queries. In the preferred embodiment ofpresent disclosure Queries Buffer 262 is a cyclic buffer.

According to another preferred embodiment of the invention said querybuffer holds a plurality of alerts criteria, each alert criteria isstored in said buffer until a client that provided said alert criteriadeletes said alert criteria.

Archive search module 253 acts on the archived data files of a channelby indexing the data and by returning results according to the indexeddata. The archived data files through Archive search module 253 are aresult source for the Query and Results Manager 255.

The Semi-static database search module 254 acts on the semi-staticdatabase that is an index, holding semi-static channel information suchas channel ID, channel description, name, topic, and keywords. Thedatabase described “semi-static”, as the information therein isstructured (i.e.-said information is associated to information fields),is relatively small and changes infrequently. Semi-static database viasemi-static database search module 254 is a result source for the Queryand Results Manager 255.

It will be appreciated that other forms of search could be contemplatedin other embodiments such as thesaurus-mode search or historical-modesearch. Therefore, the above description should not be interpreted as alimitation to the present disclosure.

The operation of the Search Engine module will be described next.Information packets are extracted out of incoming information streams.The messages are structured, times-stamped and transferred to theoperative modules of the Search Engine. The structured messages containcontrol data such as channel ID, message ID, time stamp indicative ofthe time of arrival and content information such as textual data. Themessages transferred through Message Filter 251 which blocks specificmessages according to predefined rules. For example, messagesoriginating in particular channels or having specific text content orhaving particular characteristics could be discarded. The filteredmessages are inserted into Messages Buffer 252 which is managed andsynchronized by Messages Coordinator 250. Messages coordinator 250operates in conjunction with Messages Buffer 252, which is designed tohold the messages to be retrieved for later processing. Messages Buffer252 is a cyclic buffer. Incoming messages are inserted at one end of theMessages buffer 252 while retrieved from the other end. The messages arekept in the buffer for a predefined period of time. Time-decayedmessages may be discarded. In other embodiments of the disclosure, othermethods could be used to delete messages from Messages Buffer 252 suchas deletion by predefined priorities. For example, messages from aspecific low-priority channel could be discarded first. When a messageis deleted from message buffer 252 information relating to the receptionof extracted terms that were extracted from said messages are deletedfrom term index. Message coordinator 250 provides messages to TermExtractor 248. Term Extractor 248 performs message parsing, stemming(finding the lexicographic root) of the resulting tokens and extractsthe tokens from the messages. The tokens are transferred through aseries of Terms Filters 249. Terms Filters 249 can change or discard atoken according to predefined parameters. For example, Terms Filters 249can discard stop-words, one-letter words, frequently used words,user-predefined words and the like.

The tokens are structured into operative terms to be used by otherSearch Engine modules after Term Extractor 248 attaches identifiers tothe tokens such as channel ID, message ID and time of arrival. Finally,Term Extractor 248 dispatches the terms to real-time Indexing module257.

The purpose of Real-time Indexing module 257 is to provide a searchcapability of text received in the close past. Real Time Indexing module257 receives the terms from Term Extractor 248 and stores the operativeterms into Term Index 256 which is a dynamic data structure designed tocope with the requirement for fast indexing of terms and for fastdeletion of all references to terms related to a specific message. Inaddition, real-time Indexing module 257 performs a periodic scan fornon-used terms in Terms Index 256. Non-used terms are defined as termsthat are not referenced for a predefined period of time. Periodically, agarbage collection process is initiated by real-time Indexing module 257in order to delete the non-used terms.

The search-related element of Terms Index 256 is a data structurecontaining entries indexed by terms and holding the terms relatedinformation such as a channel ID. As a result, fast insertion andindexing of terms is accomplished.

A more detailed description of the operations related to inserting termsand removing terms from Terms Index 256 will be set forth hereunder inassociation with the related drawing.

Users initiate queries. User communication modules 266, 268, and 270transfer the queries from the user into the Search Engine modules.Queries hold one or more terms. Conveniently, the handling of a query bythe Search Engine modules is analogues to the handling of an incomingmessage. Queries are filtered by Query Filter 264, and handled byQueries Coordinator 261. Queries Coordinator 61 functions in respect tothe incoming queries in a like manner to Messages Coordinator 250functions in respect to the incoming messages. Queries Coordinator 261receives the queries from user communication modules 266, 268, and 270and transfers the queries to the Term Extractor 260. Term Extractor 260parses the queries and stems the resulting tokens. The tokens arefiltered by a series of Terms Filters 63, structured into query-terms bythe attachment of control information such as query Id and time-stampand returned to Queries Coordinator 261 to be inserted into QueriesIndex 258 in order to be matched later against the operative terms inTerms index 256.

Queries Index 258 holds query-terms for a predefined period of time toenable queries to be matched against the stream of incoming messageterms. Queries index 258 thus provides the capability to collect futureresults to queries. The above mentioned capability is accomplished inconjunction with the Future Search module 259.

Future Search module 259 operates in conjunction with the Queries Index258 by matching terms from incoming stream of messages against adatabase of relatively static queries. Said data base can hold alertcriteria, and system 1 can dispatch an alert to a client system when analert criteria is matched. Subsequently a query that was initiated inthe past can be matched against newly inserted terms as long as thequery is kept in the Queries Index 258. This type of search is definedas the “future search mode” in contrast to the “real-time search-mode”.

Scoring, or ranking of channels to be returned as a result, is doneusing a model that computes the similarity between the query and thechannel. Some of the parameters involved in computing the results are:Total amounts of terms in channel in the predefined time interval,number of relevant terms in the channel in the predefined time interval,total number of channels searched in the predefined time interval,elapsed time since the last appearance of the relevant term in thechannel in the predefined time interval and relevant terms position inthe channel. Additional factors for the score: terms in proximity torelevant term, part of speech of relevant terms, relevant term frequencyand importance in the language of the channel.

The parameters enable Query and Results Manager 255 to rank theresulting channels, in addition to standard ranking methods by the timeparameter as well by giving more weight to phrases than to thecollection of single words.

Referring now to FIG. 5 that illustrates the structure of the TermsIndex 256 tables. The Terms Index consists of two main units: The TermsHash 271 and the Messages Hash 280. Additionally Terms Index containsthe Channel Map unit 294.

Terms Hash 271 comprises the Term table 272 and the associated TermsInverted File 273. The Term Hash 271 comprises of entries whose keys areterms. Therefore, Term Hash 271 provides fast access to the entries byusing terms as access keys. The said structure also provides for fastinsertion of terms into the table.

The Terms Inverted File 273 comprises of a sorted list of Terms InvertedEntries Map 278 and at least one of the following files: (a) a totalnumber of references (Total Instances) 277 to the term in all themessages currently stored in Messages Buffer 252 of FIG. 2, (b) themodification time of the term (Last Modification Time) 274, or (c) anumber of channels that contain the term 276. Each entry, such as entry786 in Terms Inverted Entries Map 278 is keyed by the channel ID 287 andhas the number of references (Instances No) 288 to the term in thatchannel and the time of the last appearance of the term in the channel(Time of Last Appearance) 289. The number of references that are addedto the Total Instances 277 could be used to determine the channel'srelevance to a specific query.

Messages Hash 280 indexed by Message ID 281 in order to provide fastdeletion of term's references by message. Messages Hash 280 comprisesMessage ID table 281 and the associated Message Data table 290. Eachentry in Message Data table 290 contains information about one messageand pointed to by a Message Hash entry 281. Message Data table 290consists of (a) the channel ID 293 (b) message time 292, and (c) MessageTerms Keyed Map 291. The Message Terms Keyed Map 291 is a sorted list ofMessage Characteristics Entries 282. A pointer 283 keys each entry,which is unique to each term. Therefore, a Message Characteristics Entry282 can be found easily by a specific term. Message CharacteristicsEntry 282 contains the following information: (a) the number of timesthe related term was referred to in the relevant message (Instances No)284, and (b) a pointer to the related Inverted File Entry 285.

The Channel Map 294 is a list sorted by channel IDs 295. For eachchannel ID 295, Channel Map 294 holds the total number of currentlyindexed terms that belong to the channel 296. In the preferredembodiment of the present disclosure, said total number relates to thenumber of terms after filtering. In a different embodiment of thepresent disclosure, the total number could relate to the number of termsbefore filtering or to the average of both values.

The operations supported by the Terms Index 256 of FIG. 4 will bedescribed next. Terms Index 256 of FIG. 4 supports three modes ofoperation: (1) term insertion, (2) terms deletion by message ID, and (3)term deletion by the garbage collection process.

Term insertion is performed by Term Extractor 248 of FIG. 4 whenhandling a newly extracted term from an incoming message. The term isindexed in this mode of operation by Term, Message Id, Channel Id andMessage Time. When inserting a Term the following sequence of steps isperformed:

The Term 272 to Terms Inverted File 273 link is accessed or created. Apointer to Terms Inverted File (invertedFilePtr) is saved.

The Total Instances 277 member's value in Terms Inverted File 273pointed at by invertedFilePtr is increased by one.

The Last Modification Time 274 member in Terms Inverted File 273 pointedat by invertedFilePtr is updated.

The entry for channel Id 287 in Terms Inverted Entries Map 279 isaccessed or created. A pointer to the entry is saved asinvertedFileEntryPtr.

The value of Instances No 288 member in the entry pointed at byinvertedFileEntryPtr is increased by one.

The appropriate Message Data is accessed or created in Message Hash 280.A pointer to the entry is saved as messageData.

The Message Characteristic Entry 282 in Message Data 90/Message TermsKeyed Map 291 is accessed by invertedFilePtr or created. A pointer tothe entry is saved as messagecharac.

In the entry pointed at by messagecharac the value of Instances Number284 member is increased by one.

In the entry pointed at by messagecharac, the invertedFileEntry pointeris set to point at invertedFileEntryPtr.

In the Message Data 290, the Message Time 292 member is updated.

In the Message Data 290 the channel ID 293 member is updated.

Term deletion by Message Id occurs when a message is deleted. A messagecan be deleted when the Messages Buffer 252 of FIG. 4 is full or apredetermined time interval indicative of the period a message should bekept in the buffer 252 has been completed. For term deletion by MessageId the following sequence of steps is performed:

The appropriate Message Terms Keyed Map 291 is obtained from MessagesHash 280.

For each Message Characteristics Entry 282 that points to Terms InvertedFile 273:

-   -   The pointed Terms Inverted File 273 is accessed and Total        Instances 277 member's value is decreased by the Instances No        284 member's value in Message Characteristic Entry 282.    -   The Term Inverted Entry 286 is accessed and the Instance Number        288 value is decreased by Message Characteristic Entry's local        instances No member 284 value.    -   Message Characteristic Entry 282 is deleted.    -   Steps ‘c’ through ‘e’ are repeated until Message Terms Keyed Map        291 is empty.    -   The Message Id 281/Message Terms Keyed Map 291 link is deleted.        Deleting a term not via Message Id 281 is done periodically by        the garbage collecting process. The deletion is performed if the        term's last modification time occurred before a specific point        in time in the past which implies that there are currently no        messages that the specific term refers to or that the term's        Total Instances 277 member's value equals zero. When a term is        found that satisfies the above conditions a simple deletion of        the Term 272 to Terms Inverted File 273 link is performed.

Conveniently, system 1 can provide alert by various manners. Accordingto a first embodiment of the invention, future search module 259 matchesa plurality of alert criteria against the content of terms index 256.According to a second embodiment of the invention, terms index 256 hasadditional field, associated to each term, indicating whether said termis a part of an alert criteria or not. If so-said term is not deletedfrom terms hash 71 unless a client system requested to delete it. When areal time search is performed, the whole content of the terms hash ischecked, while an alert is based upon a check of only the termsidentified as a part of the alert criteria.

Referring to FIGS. 9-10 illustrating a method 300 for real time search,method 300 comprising steps 310, 330 and 350 and additional optionalsteps. Method 300 starts at step 310 of receiving a client query saidclient query regards a content of at least one information packet. Step331 is followed by step 330.

Step 330 of matching at least a portion of said client query against atleast a portion of a plurality of extracted terms to generate a queryresult, said extracted terms being extracted out of a plurality ofinformation packets provided from a plurality of information sources,said extracted terms are stored in a storage means for up to apredetermined period of time. Conveniently, the storage means is a termindex data structure. According to an aspect of the invention theresults are also provided to relevancy determination unit.

Conveniently, step 330 is preceded by step 340 of building and updatingthe term index data structure. The term index data structure may includerelevancy keywords and thus relevancy determination unit may know itscontent.

Step 340 comprising of at least one of the following steps: Step 341 ofprocessing the plurality of information packets by adding control datato said information packets. The control data comprising of informationpacket identification, information source identification and time ofarrival. Step 342 of filtering the plurality of information packets.Step 343 of parsing and stemming the plurality of information packets.Step 344 of processing said extracted terms by adding controlinformation to said extracted terms. Step 345 of filtering the extractedterms to generate filtered extracted terms. Preferably, step 345 furthercomprising at least one of the following steps: step 3161 of discardingsaid terms constructed of one-letter words; step 3162 of discarding saidterms constructed of frequently used words; step 3163 of discarding saidterms constructed of stop-words and step 3164 of discarding said termsconstructed of predefined words.

Step 346 of storing an extracted term in a term index data structure.Step 346 is preferably comprising following steps: inserting theextracted term into a terms hash table and into a terms inverted file;increasing a value of total instances in said terms inverted file;updating a value of last modification time in said terms inverted file;inserting an information source identification, said information sourceprovided the extracted term, to a terms inverted entry map table in saidterms inverted file; increasing a value of instances number in saidinverted entry map table associated with said information sourceidentification in said terms inverted file; inserting information packetdata in a messages hash table; inserting the extracted term from saidinformation packet to a messages data table; increasing a value ofinstances in said messages data table by one; updating a value ofmessage time in said messages data table; and updating a value ofinformation source identification in said message data table. It isnoted that some of these steps are illustrated at FIG. 8.

Step 346 is followed by step 347 of deleting the extracted term from theterms index data structure. Said deletion occurs either after a messagefrom which said term was expired is stored in the message buffer for apredetermined period of time. Said term can also be deleted as a resultof a garbage collection process, said process is based upon a deletionof terms that are not mentioned during a certain period.

Preferably, step 347 comprising the steps of: receiving an informationpacket identification, whereas the terms extracted from the informationpackets are to be deleted; reading the information packet identificationfrom the messages hash table in said terms index data structure;obtaining relevant entries of said extracted terms belonging to saidinformation packet in said messages data; accessing said terms invertedfile for each said terms entry pointed to said terms inverted file; anddecreasing a value of said total instances by a value of said instancesnumber for each said terms entry pointed to said terms inverted file.Step 347 further comprises a step of deleting an extracted term by agarbage collection process and canceling a link between said term insaid terms hash table and said terms inverted file is canceled. It isnoted that some of these steps are illustrated at FIG. 8.

Conveniently, step 310 is followed by step 311 of processing the clientquery by adding control data to said client query. Step 310 is followedby step 312 of filtering the client query. Said filtering involvesexcluding said information packets generated from predefined clientsystems. Step 310 is also followed by step 314 of parsing and stemmingthe client query to generate query terms. Step 314 is followed by step315 of processing the query terms by adding relevant control informationto the query-terms. Step 315 is followed by step 316 of filtering saidquery terms. Step 316 further comprising of at least one of thefollowing steps: step 3161 of discarding said terms constructed ofone-letter words; step 3162 of discarding said terms constructed offrequently used words; step 3163 of discarding said terms constructed ofstop-words; and step 3164 of discarding said terms constructed ofpredefined words. Step 316 is followed by step 317 of storing said queryterms in a term index data structure for a period that is shorter than apredefined period of time or until a query removal request is receivedfrom a user.

Conveniently, method 300 allows performing more than a single searchMode In addition to a first mode in which an incoming client query ismatched against a content of the storage means, method 300 comprises ofsteps 320, 321 and 322 for allowing additional search modes. When morethan a single search mode is selected, results of some search modes areunified to provide a single search result.

A path comprising of steps 320 and 332 allows providing alerts. Saidpath starts by step 320 of storing client queries follows step 310.Conveniently, step 320 comprising of a step of updating query index 58.Step 320 is followed by steps 332 of matching client queries/alertcriteria received and processed in the past against newly received termsto generate an alert.

Step 321 of matching the client query against historical archives ofinformational content to generate an archive query result is followed bystep 334 of processing the archive query result and a result of the step330 to generate the query result.

Step 322 of matching the client query against a semi-static database ofsaid informational content and having a low incidence of changing togenerate a semi static query result, is followed by step 335 of matchingthe client query against the semi-static database is followed by a stepof processing the semi static query result and a result of the step ofmatching at least a portion of said client query against at least aportion of a plurality of extracted terms to generate the query result.

Conveniently, a query result comprises of at least one informationsource, said at least information source provided a matching informationpacket. Step 330 further comprises a step 336 of ranking informationsources according to a similarity between at least a portion ofinformation packets provided by said information sources and between theclient query. Preferably, said ranking process is based upon at leastone of the following parameters: (a) a total amount of extracted termsprovided by an information source in a predefined time interval; (b) anelapsed time since the extracted term was provided by the informationsource in said predefined time interval; and (c) an extracted termposition in the information source.

Relevancy Calculation

Referring to FIG. 11 illustrative of a method 440 of determining arelevancy of a keyword, in accordance with a preferred embodiment of theinvention.

Method 440 starts by step 442. According to a first aspect of theinvention step 442 includes determining relevancy keywords. According toa second aspect of the invention step 442 further comprising determiningflow keywords or determining a manner in which incoming data streamflows are measured or estimated. According to a third aspect of theinvention step 442 further includes determining weight factors to beassociated with information sources that provide the received datastreams from which real time terms are extracted. For convenience ofexplanation it is assumed that step 442 includes determining flowkeywords and relevancy keywords, but as mentioned above this is notnecessarily so.

Step 442 is followed by step 444 of receiving information streams andextracting real time terms.

Step 444 is followed by step 446 of comparing the real time terms to therelevancy keywords and (according to the first aspect of the invention)to the flow keywords and accordingly updating current reception patternfor each received relevancy keyword, in response to the reception of therelevancy keyword and overall reception of flow keywords. It is notedthat each received real time term is associated with a timinginformation. The timing information may be processed in response to timezone information, but this is not necessarily so. If is further notedthat each extracted term may be associated with an indication of itsorigin, and that origin may be associated with a weight factor.

Step 446 is followed by step 448 of comparing current reception patternto previous reception pattern of each relevancy keyword that wasreceived during the test period and in response determining therelevancy level of each of the received relevancy keywords. It is notedthat the comparison may take into account the origin of the extractedterms.

Step 448 is followed by step 450 of updating client, in response torelevancy keyword statistics. The update may reflect the most relevantkeywords out of data streams that are provided by system 1 to theclients.

Relevancy keywords and flow keywords may be updated, even during theexecution of other steps of method 440, as illustrated by step 443 ofupdating relevancy keywords and flow keywords (according to a firstaspect of the invention). Step 443 is preceded by step 442 and isfollowed by step 442.

Current reception pattern includes information reflecting a reception ofrelevancy keywords during the test period. Conveniently, the test periodis of a predefined length (such as the last 12 or last 24 hours).Whenever an event of receiving a relevancy keyword or a flow keywordexits the test period the event may be utilized for calculating previousreception pattern. Accordingly, step 450 is followed by step 452 ofupdating current reception pattern and previous reception pattern. Step452 is followed by step 444.

Referring to FIG. 12 illustrating a screen in which relevant keywordsare painted according to their relevancy level. The relevancy keywordsare arranged in a folder like manner and the folder title is painted inaccordance with the most relevant relevancy keyword.

It will be apparent to those skilled in the art that the disclosedsubject matter may be modified in numerous ways and may assume manyembodiments other then the preferred form specifically set out anddescribed above.

Accordingly, the above disclosed subject matter is to be consideredillustrative and not restrictive, and to the maximum extent allowed bylaw, it is intended by the appended claims to cover all suchmodifications and other embodiments which fall within the true spiritand scope of the present invention. The scope of the invention is to bedetermined by the broadest permissible interpretation of the followingclaims and their equivalents rather then the foregoing detaileddescription.

1-101. (canceled)
 102. A computer implemented method for determiningrelevancy of real time received terms, the method comprising:determining a relevancy keyword; extracting a real time term from acurrently received information stream; updating a current receptionpattern of the relevancy keyword in response to a comparison between theextracted real time term and the relevancy keyword; and determiningrelevancy of the relevancy keyword in response to a comparison betweenthe current reception pattern and a reference reception pattern. 103.The method of claim 102, wherein the relevancy keyword is extracted froman alert criterion of a client or a client query.
 104. The method ofclaim 102, further comprising a step selected from the group consistingof: updating at least one client as to the relevancy of the relevancykeyword, estimating a flow pattern of the received information stream togenerate an estimated flow pattern, storing the real time terms in astorage means for a predetermined period of time, wherein storing thereal time term is preceded by a preprocessing step selected from thegroup consisting of: adding control data to an information packet,filtering the information packet, adding control information to thefiltered information packet, extracting the real time term from thefiltered information packet, filtering the real time term to generatethe real time term, storing the real time term in a storage means, andcombinations thereof, compensating for time differences resulting from areception of an information stream from a distinct geographicallocation, and compensating for time differences resulting from areception of an information stream relating to an event that occurs at adistinct geographical location, and combinations thereof.
 105. Themethod of claim 104, wherein the current reception pattern of therelevancy keyword is further responsive to the estimated flow pattern ofthe received information stream and estimating flow pattern comprisesmonitoring the reception of a flow keyword and the flow keywordoptionally comprises a commonly used word.
 106. The method of claim 104,wherein the control data comprises at least one parameter selected fromthe group consisting of: (i) information packet identification, (ii)information source identification, (iii) time of arrival, (iv) alertidentification, and (v) query identification.
 107. The method of claim104, wherein the real time term is extracted out of the filteredinformation packet by parsing and stemming a plurality of informationpackets; and wherein filtering further comprises a step selected fromthe group consisting of: (a) discarding a term constructed of aone-letter word, (b) discarding a term constructed of a frequently usedword, (c) discarding a term constructed of a stop-word, and (d)discarding a term constructed of a predefined word.
 108. The method ofclaim 104, wherein a reception of the information packet is followed bythe steps of: storing the information packet with an associated packetidentifier in the storage means, storing a real time term informationrepresentative of a reception of the real time term in the storagemeans, linking the stored information packet and the real time terminformation, and optionally deleting an information packet followed bydeleting the linked real time term information.
 109. The method of claim108, wherein the information packet is stored in a messages hash, andthe linked real time term information is stored in a terms hash. 110.The method of claim 109, wherein the real time term informationcomprises at least one information field selected from the groupconsisting of: a last modification time field to indicate a most recenttime of reception of the real time term during a predetermined period oftime; a number of channels containing term, to indicate a number ofinformation sources that provided the real time term during apredetermined period of time; a total instances field to indicate atotal amount of receptions of the real time term during a predeterminedperiod of time; and a terms inverted entries map, comprising of aplurality of terms inverted file entries, each entry holds informationrepresentative of a reception of the real time term from a singleinformation source during a predetermined period of time.
 111. Themethod of claim 110, wherein each inverted file entry comprises at leastone field selected from the group consisting of: a channel identifier toidentify the information source that provided the real time term duringa predetermined period of time; an instances number to indicate a totalamount of receptions of the real time term from an information sourceduring a predetermined period of time; and a time of last appearance toindicate a most recent time of reception of the real time term from aninformation source during a predetermined period of time.
 112. Themethod of claim 111, wherein the information packet is furtherassociated to a message terms key map, comprising a plurality of messagecharacteristic entries, each message characteristic entry associated toa real time term extracted from the information packet, said messagecharacteristic entry comprises of at least one field selected from thegroup consisting of: a terms inverted file to point to the termextracted information; an instance number to indicate a number of timesthe real time term appeared in the information packet; and an invertedfile entry to point to a terms inverted file entry.
 113. The method ofclaim 104, wherein the information packet comprises content selectedfrom the group consisting of: text, audio, video, multimedia, andexecutable code streaming media.
 114. The method of claim 102, whereinthe current reception pattern reflects the reception of the relevancykeyword during a test period or the reception of the relevancy keywordduring at least two test periods.
 115. The method of claim 114, whereinthe at least two test periods at least partially overlap and optionallyeach of the at least two test periods is characterized by acorresponding current reception pattern.
 116. The method of claim 115,wherein determining relevancy of the relevancy keyword comprisescomparisons between each corresponding current reception pattern and thereference reception pattern.
 117. The method of claim 116, wherein thedetermination of the relevancy value is responsive to a combination ofat least one comparison.
 118. The method of claim 114, wherein thereference reception pattern reflects the reception of the relevancykeyword during a time period that is much longer than each of the testperiods.
 119. The method of claim 102, wherein determining the relevancyof the relevancy keyword comprises attaching a relevancy level to therelevancy keyword.
 120. The method of claim 119, wherein the time periodis 24 hours and the relevancy level is selected from the groupconsisting of −4, −3, −2, −1, 0, 1, 2, 3, and 4, wherein the relevancylevel is −4 if a 24 hour normalized keyword current reception value (“24hrv”) is equal to or small than avg−std, −3 if the 24 hrv is greaterthan avg−std but smaller than or equal to avg−0.8×std, −2 if the 24 hrvis greater than avg−0.8×std but smaller than or equal to avg−0.65×std,−1 if the 24 hrv is greater than avg−0.565×std but smaller than or equalto avg−0.5×std, 0 if the 24 hrv is greater than avg−0.5×std but smallerthan or equal to${{avg} + {\left( {0.25 + \frac{0.25}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$1 if the 24 hrv is greater than${avg} + {\left( {0.25 + \frac{0.25}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$but smaller than or equal to${{avg} + {\left( {0.85 + \frac{0.5}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$2 if the 24 hrv is greater than${avg} + {\left( {0.85 + \frac{0.5}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$but smaller than or equal to${{avg} + {\left( {1.5 + \frac{0.75}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$3 if the 24 hrv is greater than${avg} + {\left( {1.5 + \frac{0.75}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$but smaller than or equal to${{avg} + {\left( {2.2 + \frac{1}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$and 4 if the 24 hrv is greater than${avg} + {\left( {2.2 + \frac{1}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$121. The method of claim 119, wherein the time period is 12 hours andthe relevancy level is selected from the group consisting of −4, −3, −2,−1, 0, 1, 2, 3, and 4, wherein the relevancy level is −4 if a 12 hournormalized keyword current reception value (“12 hrv”) is equal to orsmall than avg−1.2×std, −3 if the 12 hrv is greater than avg−1.2×std butsmaller than or equal to avg−1×std, −2 if the 12 hrv is greater thanavg−1×std but smaller than or equal to avg−0.85×std, −1 if the 12 hrv isgreater than avg−0.85×std but smaller than or equal to avg−0.7×std, 0 ifthe 12 hrv is greater than avg−0.7×std but smaller than or equal to${{avg} + {\left( {0.45 + \frac{0.45}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$1 if the 12 hrv is greater than${avg} + {\left( {0.45 + \frac{0.45}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$but smaller than or equal to${{avg} + {\left( {1.05 + \frac{0.7}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$2 if the 12 hrv is greater than${avg} + {\left( {1.05 + \frac{0.7}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$but smaller than or equal to${{avg} + {\left( {1.7 + \frac{0.95}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$3 if the 12 hrv is greater than${avg} + {\left( {1.7 + \frac{0.95}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$but smaller than or equal to${{avg} + {\left( {2.4 + \frac{1.2}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$and 4 if the 12 hrv is greater than${avg} + {\left( {2.4 + \frac{1.2}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$122. The method of claim 119, wherein the relevancy is defined by arelevancy level threshold.
 123. The method of claim 102, wherein thecurrent flow pattern is responsive to at least one weight factorassociated to at least one source of received information stream. 124.In a computing environment running on a computer platform utilized as acentral server system, a method of calculating a relevancy of arelevancy keyword is operating to allow users of client systemsconnectable thereto to receive indications about the relevancy of therelevancy keyword in response to the reception of real time terms by thecentral server system, the method comprising: determining a relevancykeyword; extracting a real time term from a currently receivedinformation stream; updating a current reception pattern of therelevancy keyword in response to a comparison between the extracted realtime term and the relevancy keyword; and determining a relevancy of therelevancy keyword in response to a comparison between the currentreception pattern and a reference reception pattern.
 125. The method ofclaim 124, wherein the relevancy keyword is extracted from a clientquery or an alert criterion of a client.
 126. The method of claim 124,further comprising a step selected from the group consisting of:updating at least one client as to the relevancy of the relevancykeyword, estimating a flow pattern of the received information stream togenerate an estimated flow pattern, compensating for time differencesresulting from a reception of an information stream from a distinctgeographical location, and compensating for time differences resultingfrom a reception of an information stream relating to an event thatoccurs at a distinct geographical locations, and combinations thereof.127. The method of claim 126, wherein the current reception pattern ofthe relevancy keyword is further responsive to the estimated flowpattern of the received information stream.
 128. The method of claim126, wherein estimating the flow pattern comprises monitoring thereception of a flow keyword and the flow keyword optionally comprises acommonly used word.
 129. The method of claim 124, wherein theinformation stream comprises content selected from the group consistingof: text, audio, video, multimedia, and executable code streaming media.130. The method of claim 124, wherein the current reception patternreflects the reception of the relevancy keyword during a test period orthe reception of the relevancy keyword during at least two test periods.131. The method of claim 130, wherein the at least two test periods atleast partially overlap and optionally each of the at least two testperiods is characterized by a corresponding current reception pattern.132. The method of claim 131, wherein determining the relevancy of therelevancy keyword comprises comparisons between each correspondingcurrent reception pattern and the reference reception pattern.
 133. Themethod of claim 132, wherein the determination of relevancy isresponsive to a combination of at least one comparison.
 134. The methodof claim 130, wherein the reference reception pattern reflects thereception of the relevancy keyword during a time period that is muchlonger than each of the test periods.
 135. The method of claim 124,wherein determining the relevancy of the relevancy keyword comprisesattaching a relevancy level to the relevancy keyword.
 136. The method ofclaim 135, wherein the time period is 24 hours and the relevancy levelis selected from the group consisting of −4, −3, −2, −1, 0, 1, 2, 3, and4, wherein the relevancy level is −4 if a 24 hour normalized keywordcurrent reception value (“24 hrv”) is equal to or small than avg−std, −3if the 24 hrv is greater than avg−std but smaller than or equal toavg−0.8×std, −2 if the 24 hrv is greater than avg−0.8×std but smallerthan or equal to avg−0.65×std, −1 if the 24 hrv is greater thanavg−0.65×std but smaller than or equal to avg−0.5×std, 0 if the 24 hrvis greater than avg−0.5×std but smaller than or equal to${{avg} + {\left( {0.25 + \frac{0.25}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$1 if the 24 hrv is greater than${avg} + {\left( {0.25 + \frac{0.25}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$but smaller than or equal to${{avg} + {\left( {0.85 + \frac{0.5}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$2 if the 24 hrv is greater than${avg} + {\left( {0.85 + \frac{0.5}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$but smaller than or equal to${{avg} + {\left( {1.5 + \frac{0.75}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$3 if the 24 hrv is greater than${avg} + {\left( {1.5 + \frac{0.75}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$but smaller than or equal to${{avg} + {\left( {2.2 + \frac{1}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$and 4 if the 24 hrv is greater than${avg} + {\left( {2.2 + \frac{1}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$137. The method of claim 135, wherein the time period is 12 hours andthe relevancy level is selected from the group consisting of −4, −3, −2,−1, 0, 1, 2, 3, and 4, wherein the relevancy level is −4 if a 12 hournormalized keyword current reception value (“12 hrv”) is equal to orsmall than avg−1.2×std, −3 if the 12 hrv is greater than avg−1.2×std butsmaller than or equal to avg−1×std, −2 if the 12 hrv is greater thanavg−1×std but smaller than or equal to avg−0.85×std, −1 if the 12 hrv isgreater than avg−0.85×std but smaller than or equal to avg−0.7×std, 0 ifthe 12 hrv is greater than avg−0.7×std but smaller than or equal to${{avg} + {\left( {0.45 + \frac{0.45}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$1 if the 12 hrv is greater than${avg} + {\left( {0.45 + \frac{0.45}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$but smaller than or equal to${{avg} + {\left( {1.05 + \frac{0.7}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$2 if the 12 hrv is greater than${avg} + {\left( {1.05 + \frac{0.7}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$but smaller than or equal to${{avg} + {\left( {1.7 + \frac{0.95}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$3 if the 12 hrv is greater than${avg} + {\left( {1.7 + \frac{0.95}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}$but smaller than or equal to${{avg} + {\left( {2.4 + \frac{1.2}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {std}}},$and 4 if the 12 hrv is greater than${avg} + {\left( {2.4 + \frac{1.2}{\ln \left( {1.05 + {avg}} \right)}} \right) \times {{std}.}}$138. The method of claim 135, wherein the relevancy level is defined bya relevancy level threshold.
 139. The method of claim 124, wherein thecurrent flow pattern is responsive to at least one weight factorassociated to at least one source of received information stream.